Skip to content

[PGO] Add llvm.loop.estimated_trip_count metadata #148758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 31, 2025

Conversation

jdenny-ornl
Copy link
Collaborator

This patch implements the llvm.loop.estimated_trip_count metadata discussed in [RFC] Fix Loop Transformations to Preserve Block Frequencies. As suggested in the RFC comments, it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's branch_weights metadata. As suggested in the PR #128785 review, it does so via a new PGOEstimateTripCountsPass pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form.

An important observation not previously discussed is that PGOEstimateTripCountsPass often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, llvm::getLoopEstimatedTripCount disregards it and tries again to estimate the trip count from the loop's current branch_weights metadata.

This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights`
metadata.  As [suggested in the PR#128785
review](#128785 (comment)),
it does so via a `PGOEstimateTripCountsPass` pass, which creates the
new metadata for the loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip
count but later passes can transform the loop in a way that makes it
possible.  Currently, such passes do not necessarily update the
metadata, but eventually that should be fixed.  Until then, if the new
metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it
and tries again to estimate the trip count from the loop's
`branch_weights` metadata.
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-analysis

Author: Joel E. Denny (jdenny-ornl)

Changes

This patch implements the llvm.loop.estimated_trip_count metadata discussed in [RFC] Fix Loop Transformations to Preserve Block Frequencies. As suggested in the RFC comments, it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's branch_weights metadata. As suggested in the PR #128785 review, it does so via a new PGOEstimateTripCountsPass pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form.

An important observation not previously discussed is that PGOEstimateTripCountsPass often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, llvm::getLoopEstimatedTripCount disregards it and tries again to estimate the trip count from the loop's current branch_weights metadata.


Patch is 65.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148758.diff

18 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+61)
  • (modified) llvm/include/llvm/Analysis/LoopInfo.h (+7-3)
  • (added) llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h (+24)
  • (modified) llvm/include/llvm/Transforms/Utils/LoopUtils.h (+65-20)
  • (modified) llvm/lib/Analysis/LoopInfo.cpp (+7-3)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+8-2)
  • (modified) llvm/lib/Passes/PassRegistry.def (+1)
  • (modified) llvm/lib/Transforms/Instrumentation/CMakeLists.txt (+1)
  • (added) llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp (+45)
  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+135-51)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+29-2)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+6)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+25-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/check-prof-info.ll (+30-24)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr81872.ll (+8-6)
  • (modified) llvm/test/Transforms/LoopVectorize/branch-weights.ll (+31-25)
  • (added) llvm/test/Transforms/PGOProfile/pgo-estimate-trip-counts.ll (+240)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8ea850af7a69b..5c5c6ec96cfc7 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7933,6 +7933,67 @@ The attributes in this metadata is added to all followup loops of the
 loop distribution pass. See
 :ref:`Transformation Metadata <transformation-metadata>` for details.
 
+'``llvm.loop.estimated_trip_count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata records an estimated trip count for the loop.  The first operand
+is the string ``llvm.loop.estimated_trip_count``.  The second operand is an
+integer specifying the count, which might be omitted for the reasons described
+below.  For example:
+
+.. code-block:: llvm
+
+   !0 = !{!"llvm.loop.estimated_trip_count", i32 8}
+   !1 = !{!"llvm.loop.estimated_trip_count"}
+
+Purpose
+"""""""
+
+A loop's estimated trip count is an estimate of the average number of loop
+iterations (specifically, the number of times the loop's header executes) each
+time execution reaches the loop.  It is usually only an estimate based on, for
+example, profile data.  The actual number of iterations might vary widely.
+
+The estimated trip count serves as a parameter for various loop transformations
+and typically helps estimate transformation cost.  For example, it can help
+determine how many iterations to peel or how aggressively to unroll.
+
+Initialization and Maintenance
+""""""""""""""""""""""""""""""
+
+The ``pgo-estimate-trip-counts`` pass typically runs immediately after profile
+ingestion to add this metadata to all loops.  It estimates each loop's trip
+count from the loop's ``branch_weights`` metadata.  This way of initially
+estimating trip counts appears to be useful for the passes that consume them.
+
+As passes transform existing loops and create new loops, they must be free to
+update and create ``branch_weights`` metadata to maintain accurate block
+frequencies.  Trip counts estimated from this new ``branch_weights`` metadata
+are not necessarily useful to the passes that consume them.  In general, when
+passes transform and create loops, they should separately estimate new trip
+counts from previously estimated trip counts, and they should record them by
+creating or updating this metadata.  For this or any other work involving
+estimated trip counts, passes should always call
+``llvm::getLoopEstimatedTripCount`` and ``llvm::setLoopEstimatedTripCount``.
+
+Missing Metadata and Values
+"""""""""""""""""""""""""""
+
+If the current implementation of ``pgo-estimate-trip-counts`` cannot estimate a
+trip count from the loop's ``branch_weights`` metadata due to the loop's form or
+due to missing profile data, it creates this metadata for the loop but omits the
+value.  This situation is currently common (e.g., the LLVM IR loop that Clang
+emits for a simple C ``for`` loop).  A later pass (e.g., ``loop-rotate``) might
+modify the loop's form in a way that enables estimating its trip count even if
+those modifications provably never impact the actual number of loop iterations.
+That later pass should then add an appropriate value to the metadata.
+
+However, not all such passes currently do so.  Thus, if this metadata has no
+value, ``llvm::getLoopEstimatedTripCount`` will disregard it and estimate the
+trip count from the loop's ``branch_weights`` metadata.  It does the same when
+the metadata is missing altogether, perhaps because ``pgo-estimate-trip-counts``
+was not specified in a minimal pass list to a tool like ``opt``.
+
 '``llvm.licm.disable``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/include/llvm/Analysis/LoopInfo.h b/llvm/include/llvm/Analysis/LoopInfo.h
index a7a6a2753709c..a06be573b5e01 100644
--- a/llvm/include/llvm/Analysis/LoopInfo.h
+++ b/llvm/include/llvm/Analysis/LoopInfo.h
@@ -637,9 +637,13 @@ LLVM_ABI std::optional<bool> getOptionalBoolLoopAttribute(const Loop *TheLoop,
 /// Returns true if Name is applied to TheLoop and enabled.
 LLVM_ABI bool getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name);
 
-/// Find named metadata for a loop with an integer value.
-LLVM_ABI std::optional<int> getOptionalIntLoopAttribute(const Loop *TheLoop,
-                                                        StringRef Name);
+/// Find named metadata for a loop with an integer value.  Return
+/// \c std::nullopt if the metadata has no value or is missing altogether.  If
+/// \p Missing, set \c *Missing to indicate whether the metadata is missing
+/// altogether.
+LLVM_ABI std::optional<int>
+getOptionalIntLoopAttribute(const Loop *TheLoop, StringRef Name,
+                            bool *Missing = nullptr);
 
 /// Find named metadata for a loop with an integer value. Return \p Default if
 /// not set.
diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
new file mode 100644
index 0000000000000..1b35c1c77e5c3
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h
@@ -0,0 +1,24 @@
+//===- PGOEstimateTripCounts.h ----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
+#define LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct PGOEstimateTripCountsPass
+    : public PassInfoMixin<PGOEstimateTripCountsPass> {
+  PGOEstimateTripCountsPass() {}
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_INSTRUMENTATION_PGOESTIMATETRIPCOUNTS_H
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index e4d2f9d191707..7d03fb0d81e4c 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -316,28 +316,73 @@ LLVM_ABI TransformationMode hasDistributeTransformation(const Loop *L);
 LLVM_ABI TransformationMode hasLICMVersioningTransformation(const Loop *L);
 /// @}
 
-/// Set input string into loop metadata by keeping other values intact.
-/// If the string is already in loop metadata update value if it is
-/// different.
-LLVM_ABI void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
-                                      unsigned V = 0);
-
-/// Returns a loop's estimated trip count based on branch weight metadata.
-/// In addition if \p EstimatedLoopInvocationWeight is not null it is
-/// initialized with weight of loop's latch leading to the exit.
-/// Returns a valid positive trip count, saturated at UINT_MAX, or std::nullopt
-/// when a meaningful estimate cannot be made.
+/// Set the string \p MDString into the loop metadata of \p TheLoop while
+/// keeping other loop metadata intact.  Set \p *V as its value, or set it
+/// without a value if \p V is \c std::nullopt to indicate the value is unknown.
+/// If \p MDString is already in the loop metadata, update it if its value (or
+/// lack of value) is different.  Return true if metadata was changed.
+LLVM_ABI bool addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
+                                      std::optional<unsigned> V = 0);
+
+/// Return either:
+/// - The value of \c llvm.loop.estimated_trip_count from the loop metadata of
+///   \p L, if that metadata is present and has a value.
+/// - Else, a new estimate of the trip count from the latch branch weights of
+///   \p L, if the estimation's implementation is able to handle the loop form
+///   of \p L (e.g., \p L must have a latch block that controls the loop exit).
+/// - Else, \c std::nullopt.
+///
+/// An estimated trip count is always a valid positive trip count, saturated at
+/// \c UINT_MAX.
+///
+/// Via \c LLVM_DEBUG, emit diagnostics that include "WARNING" when the metadata
+/// is in an unexpected state as that indicates some transformation has
+/// corrupted it.  If \p DbgForInit, expect the metadata to be missing.
+/// Otherwise, expect the metadata to be present, and expect it to have no value
+/// only if the trip count is currently inestimable from the latch branch
+/// weights.
+///
+/// In addition, if \p EstimatedLoopInvocationWeight, then either:
+/// - Set \p *EstimatedLoopInvocationWeight to the weight of the latch's branch
+///   to the loop exit.
+/// - Do not set it and return \c std::nullopt if the current implementation
+///   cannot compute that weight (e.g., if \p L does not have a latch block that
+///   controls the loop exit) or the weight is zero (because zero cannot be
+///   used to compute new branch weights that reflect the estimated trip count).
+///
+/// TODO: Eventually, once all passes have migrated away from setting branch
+/// weights to indicate estimated trip counts, this function will drop the
+/// \p EstimatedLoopInvocationWeight parameter.
 LLVM_ABI std::optional<unsigned>
 getLoopEstimatedTripCount(Loop *L,
-                          unsigned *EstimatedLoopInvocationWeight = nullptr);
-
-/// Set a loop's branch weight metadata to reflect that loop has \p
-/// EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight exits
-/// through latch. Returns true if metadata is successfully updated, false
-/// otherwise. Note that loop must have a latch block which controls loop exit
-/// in order to succeed.
-LLVM_ABI bool setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
-                                        unsigned EstimatedLoopInvocationWeight);
+                          unsigned *EstimatedLoopInvocationWeight = nullptr,
+                          bool DbgForInit = false);
+
+/// Set \c llvm.loop.estimated_trip_count with the value \c *EstimatedTripCount
+/// in the loop metadata of \p L, or set it without a value if
+/// \c !EstimatedTripCount to indicate that \c getLoopEstimatedTripCount cannot
+/// estimate the trip count from latch branch weights.  If
+/// \c !EstimatedTripCount but \c getLoopEstimatedTripCount can estimate the
+/// trip counts, future calls to \c getLoopEstimatedTripCount will diagnose the
+/// metadata as corrupt.
+///
+/// In addition, if \p EstimatedLoopInvocationWeight, set the branch weight
+/// metadata of \p L to reflect that \p L has an estimated
+/// \c *EstimatedTripCount iterations and has \c *EstimatedLoopInvocationWeight
+/// exit weight through the loop's latch.
+///
+/// Return false if \c llvm.loop.estimated_trip_count was already set according
+/// to \p EstimatedTripCount and so was not updated.  Return false if
+/// \p EstimatedLoopInvocationWeight and if branch weight metadata could not be
+/// successfully updated (e.g., if \p L does not have a latch block that
+/// controls the loop exit).  Otherwise, return true.
+///
+/// TODO: Eventually, once all passes have migrated away from setting branch
+/// weights to indicate estimated trip counts, this function will drop the
+/// \p EstimatedLoopInvocationWeight parameter.
+LLVM_ABI bool setLoopEstimatedTripCount(
+    Loop *L, std::optional<unsigned> EstimatedTripCount,
+    std::optional<unsigned> EstimatedLoopInvocationWeight = std::nullopt);
 
 /// Check inner loop (L) backedge count is known to be invariant on all
 /// iterations of its outer loop. If the loop has no parent, this is trivially
diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp
index 901cfe03ecd33..ba2c30b3c4764 100644
--- a/llvm/lib/Analysis/LoopInfo.cpp
+++ b/llvm/lib/Analysis/LoopInfo.cpp
@@ -1112,9 +1112,13 @@ bool llvm::getBooleanLoopAttribute(const Loop *TheLoop, StringRef Name) {
 }
 
 std::optional<int> llvm::getOptionalIntLoopAttribute(const Loop *TheLoop,
-                                                     StringRef Name) {
-  const MDOperand *AttrMD =
-      findStringMetadataForLoop(TheLoop, Name).value_or(nullptr);
+                                                     StringRef Name,
+                                                     bool *Missing) {
+  std::optional<const MDOperand *> AttrMDOpt =
+      findStringMetadataForLoop(TheLoop, Name);
+  if (Missing)
+    *Missing = !AttrMDOpt;
+  const MDOperand *AttrMD = AttrMDOpt.value_or(nullptr);
   if (!AttrMD)
     return std::nullopt;
 
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 572e5f19a1972..f593c5bba7573 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -248,6 +248,7 @@
 #include "llvm/Transforms/Instrumentation/NumericalStabilitySanitizer.h"
 #include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
+#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
 #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
 #include "llvm/Transforms/Instrumentation/RealtimeSanitizer.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 98821bb1408a7..fc0d88e710426 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -80,6 +80,7 @@
 #include "llvm/Transforms/Instrumentation/MemProfUse.h"
 #include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
+#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
 #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
 #include "llvm/Transforms/Scalar/ADCE.h"
@@ -1268,8 +1269,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
     MPM.addPass(MemProfUsePass(PGOOpt->MemoryProfile, PGOOpt->FS));
 
   if (PGOOpt && (PGOOpt->Action == PGOOptions::IRUse ||
-                 PGOOpt->Action == PGOOptions::SampleUse))
+                 PGOOpt->Action == PGOOptions::SampleUse)) {
     MPM.addPass(PGOForceFunctionAttrsPass(PGOOpt->ColdOptType));
+    // TODO: Is this the right place for this pass?  Should we enable it in any
+    // other case, such as when __builtin_expect_with_probability or
+    // __builtin_expect appears in the source code but profiles are not read?
+    MPM.addPass(PGOEstimateTripCountsPass());
+  }
 
   MPM.addPass(AlwaysInlinerPass(/*InsertLifetimeIntrinsics=*/true));
 
@@ -2355,4 +2361,4 @@ AAManager PassBuilder::buildDefaultAAPipeline() {
 bool PassBuilder::isInstrumentedPGOUse() const {
   return (PGOOpt && PGOOpt->Action == PGOOptions::IRUse) ||
          !UseCtxProfile.empty();
-}
\ No newline at end of file
+}
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 96250772da4a0..6e7ac959f57f4 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -124,6 +124,7 @@ MODULE_PASS("openmp-opt", OpenMPOptPass())
 MODULE_PASS("openmp-opt-postlink",
             OpenMPOptPass(ThinOrFullLTOPhase::FullLTOPostLink))
 MODULE_PASS("partial-inliner", PartialInlinerPass())
+MODULE_PASS("pgo-estimate-trip-counts", PGOEstimateTripCountsPass())
 MODULE_PASS("pgo-icall-prom", PGOIndirectCallPromotion())
 MODULE_PASS("pgo-instr-gen", PGOInstrumentationGen())
 MODULE_PASS("pgo-instr-use", PGOInstrumentationUse())
diff --git a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
index 15fd421a41b0f..0a97ed4b51e69 100644
--- a/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
+++ b/llvm/lib/Transforms/Instrumentation/CMakeLists.txt
@@ -16,6 +16,7 @@ add_llvm_component_library(LLVMInstrumentation
   LowerAllowCheckPass.cpp
   PGOCtxProfFlattening.cpp
   PGOCtxProfLowering.cpp
+  PGOEstimateTripCounts.cpp
   PGOForceFunctionAttrs.cpp
   PGOInstrumentation.cpp
   PGOMemOPSizeOpt.cpp
diff --git a/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
new file mode 100644
index 0000000000000..762aca0b897ce
--- /dev/null
+++ b/llvm/lib/Transforms/Instrumentation/PGOEstimateTripCounts.cpp
@@ -0,0 +1,45 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Instrumentation/PGOEstimateTripCounts.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Transforms/Utils/LoopUtils.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "pgo-estimate-trip-counts"
+
+static bool runOnLoop(Loop *L) {
+  bool MadeChange = false;
+  std::optional<unsigned> TC = getLoopEstimatedTripCount(
+      L, /*EstimatedLoopInvocationWeight=*/nullptr, /*DbgForInit=*/true);
+  MadeChange |= setLoopEstimatedTripCount(L, TC);
+  for (Loop *SL : *L)
+    MadeChange |= runOnLoop(SL);
+  return MadeChange;
+}
+
+PreservedAnalyses PGOEstimateTripCountsPass::run(Module &M,
+                                                 ModuleAnalysisManager &AM) {
+  FunctionAnalysisManager &FAM =
+      AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
+  bool MadeChange = false;
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": start\n");
+  for (Function &F : M) {
+    if (F.isDeclaration())
+      continue;
+    LoopInfo *LI = &FAM.getResult<LoopAnalysis>(F);
+    if (!LI)
+      continue;
+    for (Loop *L : *LI)
+      MadeChange |= runOnLoop(L);
+  }
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": end\n");
+  return MadeChange ? PreservedAnalyses::none() : PreservedAnalyses::all();
+}
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 200d1fb854155..50530590cf368 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -54,6 +54,8 @@ using namespace llvm::PatternMatch;
 
 static const char *LLVMLoopDisableNonforced = "llvm.loop.disable_nonforced";
 static const char *LLVMLoopDisableLICM = "llvm.licm.disable";
+static const char *LLVMLoopEstimatedTripCount =
+    "llvm.loop.estimated_trip_count";
 
 bool llvm::formDedicatedExitBlocks(Loop *L, DominatorTree *DT, LoopInfo *LI,
                                    MemorySSAUpdater *MSSAU,
@@ -201,34 +203,40 @@ void llvm::initializeLoopPassPass(PassRegistry &Registry) {
 }
 
 /// Create MDNode for input string.
-static MDNode *createStringMetadata(Loop *TheLoop, StringRef Name, unsigned V) {
+static MDNode *createStringMetadata(Loop *TheLoop, StringRef Name,
+                                    std::optional<unsigned> V) {
   LLVMContext &Context = TheLoop->getHeader()->getContext();
-  Metadata *MDs[] = {
-      MDString::get(Context, Name),
-      ConstantAsMetadata::get(ConstantInt::get(Type::getInt32Ty(Context), V))};
-  return MDNode::get(Context, MDs);
+  if (V) {
+    Metadata *MDs[] = {MDString::get(Context, Name),
+                       ConstantAsMetadata::get(
+                           ConstantInt::get(Type::getInt32Ty(Context), *V))};
+    return MDNode::get(Context, MDs);
+  }
+  return MDNode::get(Context, {MDString::get(Context, Name)});
 }
 
-/// Set input string into loop metadata by keeping other values intact.
-/// If the string is already in loop metadata update value if it is
-/// different.
-void llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
-                                   unsigned V) {
+bool llvm::addStringMetadataToLoop(Loop *TheLoop, const char *StringMD,
+                                   std::optional<unsigned> V) {
   SmallVector<Metadata *, 4> MDs(1);
   // If the loop already has metadata, retain it.
   MDNode *LoopID = TheLoop->getLoopID();
   if (LoopID) {
     for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
       MDNode *Node = cast<MDNode>(LoopID->getOperand(i));
-      // If it is of form key = value, try to parse it.
-      if (Node->getNumOperands() == 2) {
+      // If it is of form key [= value], try to parse it.
+      unsigned NumOps = Node->getNumOperands();
+   ...
[truncated]

Copy link
Member

@mtrofin mtrofin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reading through, and sorry for the delay on my end, I was on vacation. One high level question, should this maybe also touch the bitcode validator?

@jdenny-ornl
Copy link
Collaborator Author

still reading through, and sorry for the delay on my end, I was on vacation.

No worries. I'm still catching up after a vacation as well.

One high level question, should this maybe also touch the bitcode validator?

Are you suggesting something similar to your ProfileVerifierPass?

@mtrofin
Copy link
Member

mtrofin commented Jul 22, 2025

still reading through, and sorry for the delay on my end, I was on vacation.

No worries. I'm still catching up after a vacation as well.

One high level question, should this maybe also touch the bitcode validator?

Are you suggesting something similar to your ProfileVerifierPass?

No - llvm/lib/IR/Verifier.cpp rather. See e.g. visitProfMetadata there. You could get it to verify that the new metadata is well-formed (like, has the right number of arguments)

@jdenny-ornl
Copy link
Collaborator Author

No - llvm/lib/IR/Verifier.cpp rather. See e.g. visitProfMetadata there. You could get it to verify that the new metadata is well-formed (like, has the right number of arguments)

Done. Let me know if it's what you had in mind.

Somehow, on some of my builds, `llvm::` prefixes are dropped from some
symbol names in the printed past list.
@jdenny-ornl jdenny-ornl merged commit f7b6501 into main Jul 31, 2025
10 checks passed
@jdenny-ornl jdenny-ornl deleted the users/jdenny-ornl/pgo-estimated-trip-count branch July 31, 2025 16:28
@jdenny-ornl
Copy link
Collaborator Author

Thanks for the review.

@jdenny-ornl jdenny-ornl restored the users/jdenny-ornl/pgo-estimated-trip-count branch July 31, 2025 16:39
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a substantial compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=3e579d93ab50952628a51bda05f3a39f6a5a631c&to=f7b65011de519b1bd987892475db61f99dde44ce&stat=instructions%3Au

The new pass looks somewhat problematic to me, from a pipeline perspective. Even if the whole-world invalidation is fixed, this is still going to compute DT and LI in a pretty random part of the optimization pipeline, where these will probably get invalidated before they can be reused.

I'm not super clear on what the benefits of this approach with an early inference pass are. I like this conceptually, but if in practice we can usually only infer the estimated trip counts after loop rotation, is the early pass really useful?

It's probably worth pointing out that just the presence of loop metadata impacts optimization behavior (in an attempt to preserve the metadata), so inserting dummy metadata for all loops will have secondary effects.

I'd probably suggest reverting this for now, we need to at least sort out the impact on the optimization pipeline.

jdenny-ornl added a commit that referenced this pull request Jul 31, 2025
This reverts commit 3a18fe3.

So that we can revert PR #148758.
jdenny-ornl added a commit that referenced this pull request Jul 31, 2025
jdenny-ornl added a commit that referenced this pull request Jul 31, 2025
@jdenny-ornl
Copy link
Collaborator Author

Thanks for the explanation. I think I have it reverted properly.

@mtrofin What are your thoughts?

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 31, 2025
@rupprecht
Copy link
Collaborator

Less of an issue than the compile time regression, but this also introduced a layering violation between IR and TransformUtils. ~every file in lib/Transforms/Utils/ uses llvm/IR/*.h, and this commit added the reverse dependency as now llvm/lib/IR/Verifier.cpp includes llvm/Transforms/Utils/LoopUtils.h.

@mtrofin
Copy link
Member

mtrofin commented Jul 31, 2025

Thanks for the explanation. I think I have it reverted properly.

@mtrofin What are your thoughts?

Could this be enabled behind a flag? IIRC there are a few changes needed here, besides this one; after which presumably we could try it out on some benchmarks.

@jdenny-ornl
Copy link
Collaborator Author

jdenny-ornl commented Jul 31, 2025

Less of an issue than the compile time regression, but this also introduced a layering violation between IR and TransformUtils. ~every file in lib/Transforms/Utils/ uses llvm/IR/*.h, and this commit added the reverse dependency as now llvm/lib/IR/Verifier.cpp includes llvm/Transforms/Utils/LoopUtils.h.

That just enables the verifier and utils to share LLVMLoopEstimatedTripCount = "llvm.loop.estimated_trip_count". What's the right way to do that?

Could this be enabled behind a flag? IIRC there are a few changes needed here, besides this one; after which presumably we could try it out on some benchmarks.

If I understood correctly, the problems @nikic pointed out are all related to the new pass. Even with this PR, getLoopEstimatedTripCount still estimates trip counts from branch weights when the metadata is missing, so my patches to fix BFI issues do not need the pass. For my purposes then, yes, the pass can go behind a flag so that the pass can be investigated as an orthogonal issue.

I'm not super clear on what the benefits of this approach with an early inference pass are. I like this conceptually, but if in practice we can usually only infer the estimated trip counts after loop rotation, is the early pass really useful?

@mtrofin Would you comment on this point?

It's probably worth pointing out that just the presence of loop metadata impacts optimization behavior (in an attempt to preserve the metadata), so inserting dummy metadata for all loops will have secondary effects.

One possibility is for the pass to add the new metadata only to loops that have branch weights. That way, you would only pay for it where you are doing PGO-like work. The PR currently adds the metadata without a value [EDIT: when a loop has no branch weights].

@rupprecht
Copy link
Collaborator

Less of an issue than the compile time regression, but this also introduced a layering violation between IR and TransformUtils. ~every file in lib/Transforms/Utils/ uses llvm/IR/*.h, and this commit added the reverse dependency as now llvm/lib/IR/Verifier.cpp includes llvm/Transforms/Utils/LoopUtils.h.

That just enables the verifier and utils to share LLVMLoopEstimatedTripCount = "llvm.loop.estimated_trip_count". What's the right way to do that?

I really don't know what the best way to do this is, but here are a couple possibilities:

  • Define LLVMLoopEstimatedTripCount in IR and have TransformUtils get it from there.
  • Put LLVMLoopEstimatedTripCount in some lower level library (or create a new one) that both IR and TransformUtils depend on (e.g. Support)
  • Just duplicate the string constant, possibly w/ a test to ensure they're the same value
  • Pull Verifier.cpp out into a separate library (e.g. create IRVerifier), assuming the rest of IR doesn't depend on it.

btw, see also https://llvm.org/docs/CodingStandards.html#library-layering

jdenny-ornl added a commit to jdenny-ornl/llvm-project that referenced this pull request Aug 8, 2025
jdenny-ornl added a commit to jdenny-ornl/llvm-project that referenced this pull request Aug 8, 2025
@jdenny-ornl
Copy link
Collaborator Author

I really don't know what the best way to do this is, but here are a couple possibilities:

* Define `LLVMLoopEstimatedTripCount` in IR and have TransformUtils get it from there.

Thanks. I went with that in PR #152775.

krishna2803 pushed a commit to krishna2803/llvm-project that referenced this pull request Aug 12, 2025
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR llvm#128785
review](llvm#128785 (comment)),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
krishna2803 pushed a commit to krishna2803/llvm-project that referenced this pull request Aug 12, 2025
This reverts commit 3a18fe3.

So that we can revert PR llvm#148758.
krishna2803 pushed a commit to krishna2803/llvm-project that referenced this pull request Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants